The new baseline for high dimensional dataset by ranked mutual information features

نویسندگان

چکیده

Feature selection is a process of selecting group relevant features by removing unnecessary for use in constructing the predictive model. However, high dimensional data increases difficulty feature due to curse dimensionality. From past research, performance model always compared with existing results. When attempting new dataset, current practice benchmark dataset obtained including all features, redundant and noise. Here we propose optimal baseline mean ranked using mutual information score. The quality depends on contained more contains better number achieve this will be at same time, serve as guideline needed method. We also show some experimental results that proposed method provides fewer features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lag selection for regression models using high-dimensional mutual information

Mutual information may be used to select the embedding lag of a time series. However, this lag selection is usually limited to the analysis of the mutual information between a pair of lagged values in the series. In this paper, generalized mutual information estimators are proposed to take into account more than two variables in the lag selection. Experimental results show that lag selection us...

متن کامل

Quantum mutual information capacity for high-dimensional entangled states.

High-dimensional Hilbert spaces used for quantum communication channels offer the possibility of large data transmission capabilities. We propose a method of characterizing the channel capacity of an entangled photonic state in high-dimensional position and momentum bases. We use this method to measure the channel capacity of a parametric down-conversion state by measuring in up to 576 dimensio...

متن کامل

A New Indexing Method for High Dimensional Dataset

Indexing high dimensional datasets has attracted extensive attention from many researchers in the last decade. Since R-tree type of index structures are known as suffering “curse of dimensionality” problems, Pyramid-tree type of index structures, which are based on the B-tree, have been proposed to break the curse of dimensionality. However, for high dimensional data, the number of pyramids is ...

متن کامل

Classification of Chronic Kidney Disease Patients via k-important Neighbors in High Dimensional Metabolomics Dataset

Background: Chronic kidney disease (CKD), characterized by progressive loss of renal function, is becoming a growing problem in the general population. New analytical technologies such as “omics”-based approaches, including metabolomics, provide a useful platform for biomarker discovery and improvement of CKD management. In metabolomics studies, not only prediction accuracy is ...

متن کامل

study of cohesive devices in the textbook of english for the students of apsychology by rastegarpour

this study investigates the cohesive devices used in the textbook of english for the students of psychology. the research questions and hypotheses in the present study are based on what frequency and distribution of grammatical and lexical cohesive devices are. then, to answer the questions all grammatical and lexical cohesive devices in reading comprehension passages from 6 units of 21units th...

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ITM web of conferences

سال: 2021

ISSN: ['2271-2097', '2431-7578']

DOI: https://doi.org/10.1051/itmconf/20213601014